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Abstract — Regenerating codes are a class of codes proposed 
for efficient repair of failed nodes in distributed storage systems. 
In this paper, we address the fundamental problem of handling 
errors and erasures which may occur during data reconstruction 
and node repair in regenerating codes. There are numerous 
scenarios which motivate this problem such as time-critical data 
recovery, dynamic load balancing, and security from malicious 
adversaries. We provide outer bounds, and explicit regenerating 
codes achieving these bounds for a wide range of system param- 
eters. This also estabUshes the capacity of these systems for these 
parameter regimes. 

I. Introduction 

Distributed storage systems play a vital role in today's age 
of big data. For cost considerations, these storage systems 
often employ commodity hardware, which makes failures a 
norm rather than an exception. In order to safeguard the 
precious data against such failures, the data is typically stored 
in a redundant manner In this paper, we consider a distributed 
storage system consisting of n storage nodes in a network, 
each having a capacity to store a symbols over a finite field 
¥q. Data comprising B symbols (the message) is to be stored 
across these n nodes. An end-user (called a data collector) 
must be able to reconstruct the entire message by downloading 
the data stored in any k of these n nodes. It follows that such 
a system can tolerate failure of any {n — k) nodes, and under 
solely this requirement, can be realised using any [n, k] MDS 
code. 

A second important aspect of a distributed storage system 
is the handling of node failures. When a storage node fails, 
it is replaced by a new, empty node. The replacement node 
is required to obtain the data that was previously stored in 
the failed node, by downloading data from the remaining 
nodes in the network. We will term this process as repair 
or regeneration of a node. A typical means of accomplishing 
this is to download the entire message from the network, and 
extract the desired data from it. However, downloading the 
entire message, when it eventually stores only a fraction ji of 
it, is clearly wasteful of the network resources. 

'Regenerating codes' |1| are a class of codes that aim to 
reduce the amount of download during regeneration, while 
retaining the storage efficiency of traditional MDS codes. 
Under the operation of a regenerating code, a replacement 
node connects to any d existing nodes (termed helper nodes), 
and downloads (3 symbols from each. This setting is illustrated 
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Fig. 1: An example of the system parameters under a regen- 
erating code (in the absence of errors/erasures). The system 
comprises n = 5 storage nodes, data-reconstruction is possible 
from any k = 2 nodes, and node-regeneration from any d = 3 
nodes. 

in Fig. [T] Here, the total amount of data dp downloaded for 
regeneration is much smaller than the total size of the message 
B. Further, it is shown in ||T] that the parameters associated 
with a regenerating code must necessarily satisf}[^ 

B <^mm{a,{d-i)l3) . (1) 

1=0 

A regenerating code is said to be optimal if it satisfies this 
bound with equality. Since both storage and bandwidth come 
at a cost, it is naturally desirable to minimize both a as well 
as f3. However, it can be deduced (see fll|) that achieving 
equality in ([T]), for fixed values of B and [n, k, d], leads 
to a tradeoff between the storage space a and the amount of 
download for regeneration d/S. The two extreme points in this 
tradeoff are termed the minimum storage regenerating (MSR) 
and minimum bandwidth regenerating (MBR) points. These 
points have been well studied in the literature, and several 
explicit constructions of codes operating at these points are 
available ||2|-||8). It has also been shown in |[8) that essentially 
all other points on the tradeoff curve are not achievable. 

The amount of data downloaded to regenerate the data 
stored in a single node is of interest in several other scenarios. 
For instance, read requests to a temporarily unavailable storage 
node can be handled by regenerating the data stored in that 
node: such a scheme is indeed employed in RAID enabled 
Hadoop Distributed File Systems. The node regeneration prop- 
erty can also be employed to disseminate data in a peer-to-peer 
content distribution network (see, for example, |9J). 

A second example is in handling of data hotspots efficiently, 
by provisioning new nodes, and treating transfer of the data 

'The authors in 11] consider 'functional' repair where a replacement node 
only needs to satisfy further reconstructions and repairs. A practical, but more 
stringent requirement is that of 'exact' repair where it stores data identical to 
that in the failed node. Throughout this paper, we consider only exact repair 
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to the new nodes as node regeneration. 

In this paper, we address the problem of handling errors 
and erasures in distributed storage networks using regenerating 
codes. In particular, we are interested in codes that can 
perform data reconstruction and efficient node-regeneration in 
the presence of errors and erasures at the nodes or in the 
links. There are numerous practical scenarios which motivate 
this problem, two of which we discuss below. 

Scenario 1 (Seamless reconfiguration): Consider regener- 
ation of a node, accomplished by downloading data from a 
subset of d other helper nodes. In the midst of this operation, 
suppose one of these d nodes becomes unavailable (for exam- 
ple, due to a busy server at that node, or a bad communication 
channel). The replacement node would want to switch to 
another helper node instead. However, in general, under a 
(exact) regenerating code, the data passed by a helper node is a 
function of the identity of the (d — 1) other helper nodes (see 
l[TOJ). This would render the data already downloaded from 
the other (d — 1) nodes useless. Moreover, the reconfiguration 
would entail additional communication overheads with the 
other helper nodes informing them of the change. It is thus of 
interest to construct codes that allow seamless reconfiguration 
of the helper nodes, allowing replacement of bad nodes 
without additional downloads and overheads. 

This property can be modelled as the ability of the regen- 
erating code to handle erasures on the transmission links, as 
described subsequently in Section [II] 

Scenario 2 (Security): Consider regeneration of data stored 
in a node when a subset of the nodes in the system could 
be compromised to malicious adversaries. In this case, all 
symbols passed by the compromised nodes contain (adversar- 
ial) errors, and need to be corrected for. Moreover, one may 
also wish to detect errors in order to trace the location of the 
adversaries. 

The aspect of security in distributed storage systems em- 
ploying regenerating codes is studied in [ 11 1 where the authors 
provide an outer bound for the total amount of data that can be 
stored securely in the presence of malicious adversaries. They 
also show that the code constructed in |8| achieves this bound 
with an appropriate choice of the underlying MDS code, for 
the case d — n — 1 ai the MBR point. However, apart from 
this case, no other constructions of secure regenerating codes 
are known in the literature. 

While we were writing this paper, we came across a contem- 
poraneous work done independently |12| that is related to the 
present paper, and deals with byzantine fault tolerance using 
Product-Matrix codes |2 |. The authors use a CRC to check the 
integrity of data during regeneration and reconstruction, and 
a feedback scheme to iteratively correct them. However, CRC 
based schemes are not applicable in certain settings such as 
protection against malicious adversaries, since the CRC can 
also be corrupted by the adversary. 

In the present paper, we adopt a fundamental approach to 
the handling of errors and erasures in regenerating codes. We 
consider a system model which considers occurrences of errors 
and erasures in distributed storage systems. We present outer 
bounds on the storage and bandwidth requirements under this 
setting. We also provide explicit code constructions achieving 



these bounds for the parameters (i) MSR, all [n, fc, d> 2fc — 2] 
and (ii) MBR, all [n, fc, d\. This establishes the capacity of 
such systems for these parameters. Moreover, as a special case, 
we also establish the capacity of regenerating codes in the 
presence of malicious adversaries (as considered in fTTl) for 
these parameters, which had remained open. The decoding al- 
gorithms have a complexity identical to that of Reed-Solomon 
codes. The codes presented here are based on a 'Product- 
Matrix' construction introduced in our previous work 1 2 1 . 

The rest of the paper is organized as follows. The system 
model considered in the paper is described in Section |llj and 
outer bounds for this model are also provided in this section. 
Explicit constructions of error-resilient regenerating codes are 
provided in Section [In] 

II. System Model 

We consider a block-based model where the message is 
divided into blocks, and there is no coding across the blocks. 
All operations of encoding, decoding and regeneration are per- 
formed independently across the blocks. Thus the regenerating 
code parameters (for the error- free case) described in Section |l] 
can be considered as pertaining to a single block of data. 
More concretely, we consider a block to consist of B message 
symbols, and the storage capacity in each of the n nodes to 
be a symbols per block. One can reconstruct the B-message 
symbols by downloading the data pertaining to this block from 
any subset of k nodes, and regenerate the data stored in any 
node by downloading j3 symbols each (pertaining to this block) 
from any d nodes. 

We further assume that the granularity of any error or 
erasure is one block. In other words, we assume that during 
node-regeneration, all the /3 symbols passed by a helper 
node suffer the same fate, i.e., either they are all erased, or 
all are in error, or all are perfectly received. An identical 
assumption is made for the a symbols passed by any node 
during reconstruction, i.e., all the a symbols are assumed to 
suffer the same fate. 

The reason for this assumption is to disallow arbitrarily 
scattered errors in the model, since codes attempting to 
guard against such errors would require significantly larger 
overheads. Moreover, the block-error assumption accurately 
models several scenarios of interest, when the size of the block 
is chosen judiciously, as illustrated below. 

Consider the scenario of seamless reconfiguration discussed 
in Section |l] The size of a packet during reconstruction and 
regeneration can be assumed to be a multiple of a and /3 
(since the packet size will usually be much larger than these 
parameters). Thus when a packet is delayed or dropped all the 
/3 (or a) symbols corresponding to a block can be considered 
as erased. 

In the security scenario, to account for compromise of any 
node or link to a malicious adversary, one needs to protect 
against corruption of possibly the entire data on that node or 
link (see |11|: 'omniscient adversary'). Thus, we can model 
this scenario under our framework by simply considering the 
entire data as a single block. 

We now define formally, the error/erasure handling capabil- 
ity of a regenerating code. 
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Fig. 2: An example of the system parameters under a resilient 
regenerating code. The code depicted is (s = 1, t = 0) re- 
silient, i.e., can correct upto of one erasure during reconstruc- 
tion and node-regeneration. To facilitate this, a connectivity 
ofK = 3 = fc + lis allowed during reconstruction, and of 
A = 3 = d + 1 during regeneration. 



Definition 1 ((s, t)-resilient code): A regenerating code is 
(s, i)-resilient if it can correct upto s erasures and t errors 
during regeneration as well as reconstruction. 

To operate under a error/erasure-prone setting, it is clear 
that one needs to download additional data. One way to obtain 
more data during regeneration and reconstruction is to connect 
to more number of nodes, and we choose this approach. 
Precisely, we allow an additional connectivity of e, and denote 
the number of nodes connected to during regeneration as 
A = d + ei, and during reconstruction as k = fc + 62. The 
parameters ei and 62 depend on the error/erasure correcting 
capability expected out of the system. Fig. |2] illustrates the 
setting for a (1, 0) -resilient regenerating code. 

We now provide a bound on the capacity of resilient 
regenerating codes. It is easy to see that the bound provided 



in 1 1 1 1 for security against 'omniscient adversaries' is also 
applicable to general errors in our block-based setting]^ The 
following theorem extends it to the case of erasures and errors. 

Theorem 1: A (s, t)-resilient regenerating code, connecting 
to A and k nodes for regeneration and reconstruction respec- 
tively, must satisfy 



B < 



k-l 

E 

i=0 



min (a, {d — i)P) 



(2) 



where c? = A — s — 2t and k = k — s — 2t. 

Proof (sketch): The bound can be derived either using 
cut-set arguments in an information flow graph as in fill or 
using information theoretic arguments as in |8|. Details are 
omitted due to lack of space. ■ 

Remark 1: Observe that this bound is identical to that 
in ([T]l, for the choice of ei = 62 = s + It. 

We will call an (s, t)-resilient regenerating code as optimal 
if it meets the bound in Theorem [T] In the next section 
we present explicit constructions of optimal (s, t)-resilient 
regenerating codes. These codes meet the bound in Theorem [T] 
and hence satisfy (|2| with equality, and have A = d + s + 2t 
and K, = k + s + 2t. 

-The notation k, d and b in correspond to k, A and t respectively in 
this paper. 



An appealing feature of these codes is that they are resilient 
simultaneously for all values of s and t. This feature is 
particularly useful in the security scenario since it obviates 
the need of predicting the possible extent of compromise 
beforehand. 

Before moving on to the constructions, we briefly digress 
to explore some connections with network coding. 

Relation to Network Coding: The regenerating codes prob- 
lem described above, if relaxed to the requirement of re- 
generation of only the systematic nodes, turns out to be a 
non-multicast network coding problem. While the occurrence 
of errors and erasures in multicast network coding are well 
studied in the literature p3)-p3), the results of the present 
paper lead to a class of non-multicast networks for which 
the error/erasure capacity of the network can be achieved by 
codes that are linear, explicit, and deterministic. Moreover, the 
requirement of (exact) regeneration of the parity nodes, which 
store functions of the message,presents us with additional 
constraints as compared to the traditional network coding 
paradigm. The codes presented in this paper also optimally 
handle these additional requirements. 

III. Error-resilient Regenerating codes 

We provide explicit constructions for error-resilient MSR 
and MBR codes for 

1) MSR, all parameters [n, fc, d>2k~2], and 

2) MBR, all parameters [n, /c, d], 

which meet the outer bound provided in Theorem [T] Thus, 
this also establishes the capacity of such a system for these 
parameter values. These codes are based on Product-Matrix 
(PM) codes that were introduced in our previous work 1^^. 
Each of the constructions presented are (s, f)-resilient for all 
values of s and t simultaneously]^ 

We begin with a description and code constructions for the 
minimum storage setting, and subsequently present the mini- 
mum bandwidth case. In both cases, we first briefly describe 
the Product-Matrix code construction for the error/erasure- 
free case Q, and then prove its error and erasure correcting 
capability. 

A. (s, t)-resilient MSR Codes 

MSR codes use the minimum possible storage at each node. 
Since a data-collector connecting to any k nodes should be 
able to reconstruct all the B message symbols, each node must 
necessarily store at-least a fraction |; of the entire data. Hence 
for an MSR code we have a = ^. To meet the bound in ([T]) 
with equality (in absence of errors/erasures), an MSR code 
must satisfy 



B = ka, df3 = a + {k~l)l3 . 



(3) 



In this section we present explicit constructions of MSR 
codes for all parameter values [n, fc, d > 2fc — 2] which are 
(s, t)-resilient, for all s and t simultaneously. An example of 
a (1, 0)-resilient MSR code is provided in Fig. [3] 

^Provided, of course, that desired connectivity is available, i.e., A < n — 

1, K < n 
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Fig. 3: An example of a (1, 0)-resilient MSR code with parameters n = 6, B' = 6, a' = 2, /?' = 1. Also depicted is an 
instance of one erasure correction by connecting to A = d + 1 nodes during regeneration of the data stored in node 1. 



The code is designed for the case rf = 2fc — 2, and this can 
be extended to any d > 2fc — 2 via shortening technique for 
MSR codes provided in g), ||4). For the case of d = 2fc — 2, 
from ([3]l we get 



Consider failure of node / in the system, and let 



A/0* 



be the row of 5* corresponding to the failed node. Thus the 
o! symbols stored in node / are 



(4) 



0*^ A/0^J M = 0^5i+A/0*^ 



So 



(6) 



Since a and B both are multiples of /3, we obtain the optimal 
code for the desired parameters {B, a, /3) by first constructing 
an optimal code for the case 



a' = {k-l), B' = k{k-l) =a'{a' + 1) /?' = !, (5) node. Letting m 



The replacement for the failed node / connects to an 
arbitrary set {hj \ j — 1,...,A} of A nodes. To facilitate 
regeneration of the data of node /, node hj computes the inner 
product i/j* M(j) and passes on this value to the replacement 



and then concatenating this code /3 times in parallel. 

The PM-MSR code in |2| can be described in terms of 
an (n x a') code matrix C = 'ifM, with the i*'' row of 
C containing the a' symbols stored in node i. The (n x d) 
encoding matrix ^ is of the form = [<& A$], where $ is an 
{n X a') matrix and A is an (ri x n) diagonal matrix satisfying: 
(a) any a' rows of $ are linearly independent, (b) any d rows 
of \E' are linearly independent, and (c) the diagonal elements 
of A are all distinct. The choice of the matrix 'i' governs the 
choice of the finite field Fg, e.g., choosing 5' as Vandermonde 
(carefully chosen to satisfy the condition on A) permits any 
q > An. The {{d — 2a') x a') message matrix AI is of the 
form M — [Si S2Y, where Si and ^2 are (a' x a') symmetric 
matrices. The superscript t is used to denote the transpose of a 
vector or matrix. The two matrices Si and S2 together contain 
a' {a' + 1) distinct symbols, and these positions are populated 
by the B = a' {a' + 1) message symbols. 

The following theorems show that this code is (s, t)- 
resilient during regeneration as well as reconstruction, for all 
values of s and t. 

Theorem 1 (Node-Regeneration): In the MSR code pre- 
sented, the a symbols stored in any node can be regenerated 
by downloading /3 symbols each from any other A = d+s+2t 
nodes, in the presence of upto s (block) erasures and t (block) 
errors. 

Proof: Since we consider only block errors and erasures, 
it is sufficient to describe the regeneration algorithm for the 
code with /?' = 1, and the same algorithm is applied in parallel 
to obtain the regeneration algorithm for the desired code. 



/ 

by node hj as -0^ 
destination are 



Ajf0^, we can write the symbol passed 
ij. Thus the A symbols obtained at the 
where 



.In th. 



Since any d rows of 4' are linearly independent by construc- 
tion, and since 'I'reg comprises a subset of the rows of ^t. 



is simply an MDS encoding of the d symbols in the 
vector TTij. It follows that this code has a minimum distance 



legLLkf 



of A - d + 1 = 



2t + 1 which allows us to recover m 



using standard decoding algorithms [16J in the presence of 
upto s erasures and t errors. Thus the replacement node now 
has access to 



Since Si and 5*2 are symmetric matrices, the replacement node 
has equivalently acquired '/'^S'l and (^^5*2. Using this it can 
obtain 

^'^Si+Xfl'^S2 , (7) 

which is precisely the data previously stored in node /. ■ 
Theorem 2 (Data-Reconstruction): In the MSR code pre- 
sented, a data-collector can reconstruct all the B message 
symbols by downloading data stored in any k — k + s + 2t 
nodes in the presence of upto s (block) erasures and t (block) 
errors. 

Proof (sketch): The data reconstruction property of the 
code in the error-free case, as shown in |2|, implies that the 
data passed by the k nodes are MDS over the finite field 
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F^. Over this finite field, the message is of size k, and the By construction, "^regUlf corresponds to MDS encoding of the 
minimum distance of this MDS code is K — k + 1 = s + 2t + l. 
This guarantees reconstruction of the k source symbols over 
F^, and equivalently the ka — B source symbols over F, in 
the presence of upto s erasures and t errors. 

We omit the details on explicit decoding algorithms due to 
space constraints. ■ 



B. (s, t)-resilient MBR Codes 

MBR codes achieve the minimum possible download during 
regeneration: a replacement node downloads only what it 
stores, resulting in dj3 — a. To meet the bound in ([TJ with 
equaUty (in absence of errors/erasures) an MBR code must 
satisfy 

B = (kd- Cf]] /3, a = df3 . (8) 



In this section we present explicit constructions of MBR 
codes for all parameter values [n, k, d] which are (s, t)- 
resilient, for all s and t simultaneously. As in the case of MSR 
codes, B and a are multiples of /?, and we first construct codes 
for 

B'=(kd-(f]], a' = d, /3' = 1. (9) 



The desired code can be obtained by concatenating /3 copies 
of this code. 

The PM-MBR code in |j2) has an identical from C = "ii M 
as PM-MSR code. The MBR code has the (n x d) encoding 
matrix of the form ^E* = [$ S], where <& is an (n x k) 
matrix satisfying: (a) any k rows of $ are linearly independent, 
(b) any d rows of 5" are linearly independent. For instance, 
one can choose to be a Vandermonde matrix. The (d x 
d) message matrix M is symmetric and consists of the B' 
message symbols arranged in the following manner: 



M 



S T 
T* 



Here, the {(d — k) x k) matrix T and the (fc x k) symmetric 
matrix S contain the B' = kd - (2) = k{d - k) + ^^^^ 
message symbols as their elements. 

The following theorems show that this code is (s, t)- 
resilient during regeneration as well as reconstruction, for all 
values of s and t. 

Theorem 3 (Node-Regeneration): In the MBR code pre- 
sented, the a symbols stored in any node can be regenerated 
by downloading /3 symbols each from any A = d + s + 2t 
nodes, in the presence of upto s (block) erasures and t (block) 
errors. 

Proof: As in the case of MSR, it is sufficient to describe 
the regeneration algorithm for the code with f3' — 1. Consider 
failure of node / in the system, and let ip^^ be the row of ^ 
corresponding to the failed node. Thus the a' symbols stored in 
node / are ip^^M. We will follow the notation as in Theoremjl] 
~" " ' ' hj passes the symbol ij/" Mipf. Denoting 



The helper node 
be written as 



Mipj., the A symbols obtained at the destination can 



where 



7lA 
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vector rn^. As in the case of MSR, this code has minimum 
distance of A — d+l = s + 2t+l which allows us to recover 
rrij- in the presence of upto s erasures and t errors. Since 
the message matrix M is symmetric, to^- is precisely the a' 
symbols required. ■ 
Theorem 4 (Data-Reconstruction): In the MBR code pre- 
sented, a data-collector can reconstruct all the B message 
symbols by downloading data stored in any k = A; + s + 2t 
nodes in the presence of upto s (block) erasures and t (block) 
errors. 

Proof (sketch): As in the MSR case, the proof exploits the 
data-reconstruction property of PM-MBR codes in the error- 
free case |2|. The reconstruction property implies that, over 
F^, the minimum distance of the code is k — + l — s + 2t+l. 
This guarantees reconstruction of the k symbols over F^, and 
equivalently the B source symbols over Fg, in the presence of 
upto s erasures and t errors. Again, we omit details on explicit 
decoding algorithms due to space constraints. ■ 
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